Multi-prototype Chinese Character Embedding

نویسندگان

  • Yanan Lu
  • Yue Zhang
  • Dong-Hong Ji
چکیده

Chinese sentences are written as sequences of characters, which are elementary units of syntax and semantics. Characters are highly polysemous in forming words. We present a position-sensitive skip-gram model to learn multi-prototype Chinese character embeddings, and explore the usefulness of such character embeddings to Chinese NLP tasks. Evaluation on character similarity shows that multi-prototype embeddings are significantly better than a single-prototype baseline. In addition, used as features in the Chinese NER task, the embeddings result in a 1.74% F-score improvement over a state-of-the-art baseline.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A Prototype of Multi-Font Printed Chinese Character Reader

An approach to multi-font printed Chinese character recognition is proposed in this paper. The problems of inputting image of characters, preprocessing, character segmentati~n~feature extraction as well as character classification have been discussed. According to the characteristics of multi-font printed Chinese characters,the number of cutting across strokes, the external and internal areas w...

متن کامل

Multi-Granularity Chinese Word Embedding

This paper considers the problem of learning Chinese word embeddings. In contrast to English, a Chinese word is usually composed of characters, and most of the characters themselves can be further divided into components such as radicals. While characters and radicals contain rich information and are capable of indicating semantic meanings of words, they have not been fully exploited by existin...

متن کامل

Radical-Enhanced Chinese Character Embedding

We present a method to leverage radical for learning Chinese character embedding. Radical is a semantic and phonetic component of Chinese character. It plays an important role as characters with the same radical usually have similar semantic meaning and grammatical usage. However, existing Chinese processing algorithms typically regard word or character as the basic unit but ignore the crucial ...

متن کامل

A Study on BLSTM-RNN-based Chinese Prosodic Structure Prediction in a Unified Framework with Character-level Features

In Text-to-Speech system, prosodic attributes have to be predicted only from input text. The accuracy of prosody prediction has a significant effect on the naturalness of synthesized speech of Chinese. In this paper, we explore using neural networks to predict prosodic boundaries from Chinese text without task specific knowledge or sophisticated feature engineering. We examine sequence characte...

متن کامل

Neural Domain Adaptation with Contextualized Character Embedding for Chinese Word Segmentation

There has a large scale annotated newswire data for Chinese word segmentation. However, some research proves that the performance of the segmenter has significant decrease when applying the model trained on the newswire to other domain, such as patent and literature. The same character appeared in different words may be in different position and with different meaning. In this paper, we introdu...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2016